Sensory-Aware Multimodal Fusion for Word Semantic Similarity Estimation
نویسندگان
چکیده
Traditional semantic models are disembodied from the human perception and action. In this work, we attempt to address this problem by grounding semantic representations of words to the acoustic and visual modalities. Specifically we estimate multimodal word representations via the fusion of auditory and visual modalities with the text modality. We employ middle and late fusion of representations with modality weights assigned to each of the unimodal representations. We also propose a fusion method that assigns different weights to each word, based on how relevant that word is for the audio and visual modalities. The proposed methods are evaluated for the task of semantic similarity computation between words. To our knowledge, this is the first work that combines text, audio and visual features for the computation of multimodal semantic word representations. Multimodal models outperform the unimodal models, indicating the importance of multimodal fusion and perceptual grounding.
منابع مشابه
Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity
Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...
متن کاملMultimodal Word Distributions
Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic i...
متن کاملNetwork-Based Distributional Semantic Models
In this thesis, the unsupervised creation of language-agnostic Distributional Semantic Models (DSMs) using web harvested data is investigated for the problem of semantic similarity estimation. Semantic similarity can be regarded as the building block for numerous tasks of Natural Language Processing, e.g., affective text analysis and paraphrasing. The first part of the thesis deals with the con...
متن کاملA comprehensive model of spoken word recognition must be multimodal: Evidence from studies of language mediated visual attention
When processing language, the cognitive system has access to information from a range of modalities (e.g. auditory, visual) to support language processing. Language mediated visual attention studies have shown sensitivity of the listener to phonological, visual, and semantic similarity when processing a word. In a computational model of language mediated visual attention, that models spoken wor...
متن کاملAudio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings
Recently a “Bag-of-Audio-Words” approach was proposed [1] for the combination of lexical features with audio clips in a multimodal semantic representation, i.e., an Audio Distributional Semantic Model (ADSM). An important step towards the creation of ADSMs is the estimation of the semantic distance between clips in the acoustic space, which is especially challenging given the diversity of audio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017